Human-level performance
Machine learning systems should achieve human-level performance, and can even surpass human-level performance (online advertising, product recommendations, logistics, etc.)
Careful
Which humans do you choose to define human-level performance?
Why compare to human-level performance?
- Because the workflow of designing and building a machine learning system is more efficient when it mimics human-level performance.
- Because the human-level error as a proxy for Bayes error
What to do when ML is worse than humans
- get labeled data from humans
- gain insight from manual error analysis: why did a person get this right?
- better analysis of bias/variance
- if training error is much higher than human-level error -> reduce bias
- if training error is comparable to human-level error, but test error is higher -> reduce variance
What if human-level performance is bad too?
- if HLP is << 100%, it may indicate ambiguous labelling instructions or label inconsistency